Detection of Prosodic Events Using Acoustic-prosodic Features and Part-of-speech Tags

نویسندگان

  • Jan Buckow
  • Anton Batliner
  • Richard Huber
  • Heinrich Niemann
  • Elmar Nöth
  • Volker Warnke
چکیده

Prosody is used to improve the performance of the automatic speech translation system VERBMOBIL [8]. In our earlier work we have developed efficient and robust word-based features that describe F0, energy, speaking rate, and pauses. These features were used to classify prosodic events. We achieved the best recognition results with 95-dimensional feature vectors that describe a context of +/2 words [4]. In the experiments presented in this paper we additionally used Part-Of-Speech (POS) flags as features. The POS features are based on a hierarchical POS label system with up to 15 classes. The 95-dimensional acoustic-prosdic feature vectors are augmented with up to 105 POS features that describe a context of up to +/3 words. The new features significantly improved the recognition of phrase boundaries, phrase accents and question mood; the recognition errors could be reduced by up to 16.7%. The POS flags allow a neural network (NN) to learn a simple language model. We show that it is important to include this syntactic knowledge during the classification of the acoustic-prosodic features instead of combining it later. This implies that there is some kind of synergy: The POS information helps to correctly classify the acoustic observations. The results presented in this paper provide an effective way to improve the recognition of prosodic events with almost no computational overhead.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of the Relationship between Acoustic Features of “bæle” and the Paralinguistic Information

Language users benefit from special phonetic tools in order to communicate linguistic information as well as different emotional aspects and paralinguistic information through daily conversation. Having functions in conveying semantic information to listeners, prosodic features form the essential part of linguistic behavour, manipulating  them potentially can play an important role in transmitt...

متن کامل

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...

متن کامل

Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events

A system for automatic recognition of prosodic events in speech utterances has been developed and applied to recognizing accent tones as de ned by the tone and break index (ToBI) prosodic labeling standard. Both the acoustic and syntactic modeling portions of the system are described in the paper. The acoustic modeling portion of the system involves representation of ToBI labeled events using h...

متن کامل

Automatic Prosodic Events Detection by Using Syllable-Based Acoustic, Lexical and Syntactic Features

Automatic prosodic events detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, the complementary model method is proposed to detect prosodic events. This method discards the independent assumption between the acoustic features and the lexical and syntactic features, models not only the features of the current syllable but also the con...

متن کامل

Detection of Non-Native Named Entities Using Prosodic Features for Improved Speech Recognition and Translation

In this work, we describe the use of acoustic-prosodic features to detect and localize non-native named entities spoken by a native speaker in the target language (English) for the purpose of improved speech recognition and translation. The exaggerated variation in accent and duration introduced by the speaker for non-native names is exploited in the detection process through the use of prosodi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000